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Abstract 

Introduction: Single-nucleotide polymorphisms (SNPs) are biomarkers for 
exploring the genetic basis of many complex human diseases. The prediction 
of SNPs is promising in modern genetic analysis but it is still a great challenge 
to identify the functional SNPs in a disease-related gene. The computational 
approach has overcome this challenge and an increase in the successful rate of 
genetic association studies and reduced cost of genotyping have been achieved. 
The objective of this study is to identify deleterious non-synonymous SNPs 
(nsSNPs) associated with the COL1A1 gene. 

Material and methods: The SNPs were retrieved from the Single Nucleotide Poly- 
morphism Database (dbSNP). Using l-Mutant, protein stability change was calcu- 
lated. The potentially functional nsSNPs and their effect on proteins were 
predicted by PolyPhen and SIFT respectively. FASTSNP was used for estimation 
of risk score. 

Results: Our analysis revealed 247 SNPs as non-synonymous, out of which 
5 nsSNPs were found to be least stable by l-Mutant 2.0 with a DDG value of 
> -1.0. Four nsSNPs, namely rsl7853657, rsl7857117, rs57377812 and rsl059454, 
showed a highly deleterious tolerance index score of 0.00 with a change in their 
physicochemical properties by the SIFT server. Seven nsSNPs, namely rsl059454, 
rs8179178, rsl7853657, rsl7857117, rs72656340, rs72656344 and rs72656351, 
were found to be probably damaging with a PSIC score difference between 2.0 and 
3.5 by the PolyPhen server. Three nsSNPs, namely rsl059454, rsl7853657 and 
rsl7857117, were found to be highly polymorphic with a risk score of 3-4 with 
a possible effect of non-conservative change and splicing regulation by FASTSNP 
Conclusions: Three nsSNPs, namely rsl059454, rsl7853657 and rsl7857117, are 
potential functional polymorphisms that are likely to have a functional impact 
on the COL1A1 gene. 
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Introduction 

Osteoporosis, a serious skeletal disease commonly observed among 
the elderly, is associated with substantial morbidity and socio-economic 
burden [1, 2]. In the United States alone, more than 40 million people either 
already have osteoporosis or are at high risk due to low bone mass 
[http://www. niams.nih.gov/Health_lnfo/Bone/Osteoporosis/osteoporosis_ff. 
asp]. In Saudi Arabia, a high prevalence of osteoporosis in the elderly has 



AM 5 



Tariq Ahmad Masoodi, Mohammed A. Alsaif, Sulaiman A. Al Shammari, Adel A. Alhamdan 



been observed [3]. It is diagnosed when the bone 
mineral density (BMD) is greater than 2.5 standard 
deviations below peak bone mass according to the 
criteria of the World Health Organization [4]. Osteo- 
porosis can occur in both men and women and at 
any age, but it is most common in older women. It 
has been reported that one in two women and one 
in 5 men over the age of 50 sustain fractures due to 
osteoporosis [3]. Collagen type I al (COL1A1) 
encodes the primary subunit of a-l-chain type I 
collagen, the main structural and most abundant 
protein in bone. Within this gene, > 400 human 
disease-associated mutations have been identified, 
the majority of which are linked to osteoporosis. 
The COLIA1 gene is a strong functional candidate 
for the genetic regulation of bone mass and suscep- 
tibility to fragility fractures [5]. 

Single-nucleotide polymorphisms (SNPs) are the 
most common mutations of DNA sequence varia- 
tion for mapping complex genetic traits. About 
500,000 SNPs fall within the coding regions of the 
human genome. Among these, the non-synony- 
mous SNPs cause changes in the amino acid resi- 
dues. These are likely to be an important factor con- 
tributing to the functional diversity of the encoded 
proteins in the human population [6]. It has been 
worked out that non-synonymous SNPs (nsSNPs) 
affect the functional roles of proteins in the signal 
transduction of visual, hormonal, and other stimu- 
lants [7, 8]. These nsSNPs affect gene expression 
by modifying DNA and transcription factor binding 
[9, 10] and deactivate active sites of enzymes or 
change splice sites, thereby producing defective 
gene products [11, 12]. 

Epidemiological association studies focus a great 
amount of effort on identifying SNPs in genes that 
may have an association with disease risk, and 
often the SNPs that have an association with di- 
sease are non-synonymous. Many molecular epi- 
demiological studies focus on studying SNPs found 
in coding regions in the hope of finding significant 
association between SNPs and disease suscepti- 
bility, but often find little or no association [13]. With 
the availability of high-throughput SNP detection 
techniques, the population of nsSNPs is increasing 
rapidly, providing a platform for studying the rela- 
tionship between genotypes and phenotypes of 
human diseases. Our ability to better select an 
nsSNP for an association study can be enhanced 
by first examining the potential impact an amino 
acid variant may have on the function of the enco- 
ded protein using different SNP detection programs 
such as l-Mutant, Sort Intolerant from Tolerant 
(SIFT) and Polymorphism Phenotype (PolyPhen) [13]. 
Discovering the deleterious nsSNPs out of a pool of 
all the SNPs will be very useful for epidemiological 
population-based studies. 

So the main aim of this study is to identify dele- 
terious nsSNPs associated with the COL1A1 gene. 



Material and methods 

Methodology 

Methodology used was the same as described 
earlier [6, 13, 14]. 

SNP dataset from dbSNP 

This computational analysis used Single Nucleo- 
tide Polymorphism Database (dbSNP) (http://www. 
ncbi.nlm.nih.gov/SNP/) to identify SNPs and their 
related protein sequence for the COL1A1 gene [15]. 

Analysis of protein stability change 
by l-Mutant 2.0 

We predicted nsSNP causing protein stability 
change using the l-Mutant 2.0 tool [16] available 
from the University of Bologna (http://gpcr.biocomp. 
unibo.it/). l-Mutant 2.0 is a support vector machine 
(SVM) based tool for the automatic prediction 
of protein stability change upon single amino 
acid substitution. The protein stability change 
was predicted from the COL1A1 protein sequence 
(NP_000079). The software computed the predicted 
free energy change value or sign (DDG) which is 
calculated from the unfolding Gibbs free energy 
value of the mutated protein minus the unfolding 
Gibbs free energy value of the native protein 
(kcal/mol). A positive DDG value indicates that the 
mutated protein possesses high stability and vice 
versa. 

Evaluation of coding single nucleotide 
polymorphisms 

There are many web-based resources available 
that allow one to predict whether non-synonymous 
coding SNPs may have functional effects on prote- 
ins. We chose SIFT [17] available from http://sift.jcvi. 
org/ to perform protein conservation analysis and 
predict the phenotypic effect of amino acid substi- 
tutions. The SIFT is based on the premise that 
protein evolution is correlated with protein func- 
tion. Variants that occur at conserved alignment 
positions are expected to be tolerated less than 
those that occur at diverse positions. The algorithm 
uses a modified version of PSIBLAST [18] and Di- 
richlet mixture regularization [19] to construct 
a multiple sequence alignment of proteins that can 
be globally aligned to the query sequence and 
belong to the same clade. The underlying principle 
of this program is that it generates alignments with 
a large number of homologous sequences and 
assigns scores to each residue, ranging from zero 
to one. The SIFT scores < 0.05 are predicted by the 
algorithm to be intolerant or deleterious amino acid 
substitutions, whereas scores > 0.05 are considered 
tolerant [20]. The higher the tolerance index of a par- 
ticular amino acid substitution, the smaller is its 
likely impact. 
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Simulation of functional change 
in nsSNP by PolyPhen server 

PolyPhen [21] available from Harvard School of 
Medicine (http://genetics.bwh.harvard.edu/pph/) 
is a computational tool for identification of poten- 
tially functional nsSNPs. Predictions are based on 
a combination of phylogenetic, structural and 
sequence annotation information characterizing 
a substitution and its position in the protein. For 
a given amino acid variation, PolyPhen performs 
several steps: (a) extraction of sequence-based fea- 
tures of the substitution site from the UniProt data- 
base, (b) calculation of profile scores for two amino 
acid variants, (c) calculation of structural parame- 
ters and contacts of a substituted residue. PolyPhen 
scores were classified as 'benign' or 'probably da- 
maging' [22]. Input options for the PolyPhen server 
are protein sequence or accession number together 
with sequence position with two amino acid vari- 
ants. We submitted the query in the form of a pro- 
tein sequence with mutational position and two 
amino acid variants. PolyPhen searches for three- 
dimensional protein structures, multiple alignments 
of homologous sequences and amino acid contact 
information in several protein structure databases. 
Then it calculates position-specific independent 
count (PSIC) scores for each of two variants, and 
computes the difference of the PSIC scores of the 
two variants. The higher a PSIC score difference, the 
higher the functional impact a particular amino acid 
substitution is likely to have. A PSIC score difference 
of 1.5 or above is considered to be damaging. 

Analysis of functional nsSNPs and estimation 
of risk score by FASTSNP 

The Functional Analysis and Selection Tool for 
Single Nucleotide Polymorphism (FASTSNP) is a web 
server (http://fastsnp.ibms.sinica.edu.tw/) which 
connects many programs and databases for pro- 
cessing analysis [23]. We used FASTSNP for the 
prediction of the functional effect of nsSNPs and 
estimation of their risk score. FASTSNP uses a deci- 
sion tree for prioritizing the functional effect and 
estimating risk score. The nsSNPs were submitted 
for FASTSNP analysis and output files were displa- 
yed as a decision tree. 

Results 

SNPdatasetfromdbSNP 

The COL1A1 gene investigated in this work was 
retrieved from the dbSNP database. It contained 
a total of 716 SNPs, of which 247 were nsSNPs, 
25 were synonymous SNPs, and 32 were in non- 
coding regions, which comprise 1 SNP in the 5' UTR 
and 31 SNPs in the 3' UTR. The rest were in the intron 
region. We selected non-synonymous coding SNPs 
for our investigation. 



Identification of functional nsSNP 
by l-Mutant 2.0 

The more negative the free energy value (DDG 
value), the more likely a given point mutation is to be 
less stable and deleterious. We obtained 23 nsSNPs 
that were found to be less stable by this server, as 
shown in Table I. Out of 23 nsSNPs, 5 nsSNPs, namely 
rsl059454, rsl7853657, rsl7857117, rs41316719 and 
rs72656344, showed a DDG value of > -1.0. The re- 
maining nsSNPs showed a DDG value of < -1.0, as 
depicted in Table I. Out of 23 nsSNPs that showed 
negative DDG, three nsSNPs, namely rsl7853657, 
rsl7857117and rs57377812, changed their amino 
acid from non-polar to polar amino acid, and two 
nsSNPs, namely rsl059454 and rs72656307, changed 
their amino acids from polar to non-polar. Four 
nsSNPs, namely rsll35345, rsl800211, rs72656344 
and rs72656351, changed their amino acid from 
polar to polar mutation and the remaining ones 
changed from non-polar to non-polar mutation. 
Since the amino acid mutations in the first five 
nsSNPs changed their physiochemical properties, 
we considered these nsSNPs to be less stable and 
deleterious by this analysis. 

Predictions of deleterious and damaging 
coding nsSNPs 

Protein conservation analysis was performed 
using a sequence-homology based tool, SIFT. Two 
hundred and forty-seven nsSNPs retrieved from the 
COL1A1 gene were submitted independently to the 
SIFT program to check its tolerance index. Our 
results showed that 19 nsSNPs were deleterious, 
having a tolerance index score of < 0.05. The results 
are shown in Table I. We observed that, out of 
19 deleterious nsSNPs, 12 nsSNPs showed a highly 
deleterious tolerance index score of 0.00. Among 
these deleterious 19 nsSNPs, two nsSNPs showed 
a nucleotide change from A— >G, one from A->C, one 
from C-»T, two from C— >G and the other 13 from 
G->T (Table I). Also, according to the SIFT results, 
three nsSNPs, namely rsl7853657, rsl7857117 and 
rs57377812, changed their amino acid from non- 
polar to polar amino acid, and one nsSNP, namely 
rsl059454, changed its amino acid from polar to 
non-polar amino acid in the mutant protein. We 
found that these four nsSNPs that are seen to be 
deleterious according to SIFT were also found less 
stable by the l-Mutant 2.0 server. Therefore, these 
four nsSNPs were found deleterious by this inves- 
tigation. 

Identification of damaged COL1A1 nsSNPs 
by PolyPhen server 

To identify the COL1A1 nsSNPs that affected 
protein structure, the COL1A1 nsSNPs were analy- 
zed for predicting a possible impact of amino acids 
on the structure and function of the protein using 
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the PolyPhen server. The COL1A1 protein sequence 
(NP_000079) with each nsSNP position and their 
2 amino acid variants was submitted as input for 
analyzing the protein structural change due to 
amino acids. Our result showed 7 nsSNPs, namely 
rsl059454, rs8179178, rsl7853657, rsl7857117, 
rs72656340, rs72656344 and rs72656351, to be 
probably damaging, with a PSIC score difference 
between 2.0 and 3.5. The rsl059454, rsl7853657 
and rsl7857117 which were observed to be the 
cause of protein lower stability by the l-Mutant 2.0 



server and SIFT were also predicted to be probably 
damaging by the PolyPhen server. In addition, the 
other four nsSNPs are highly confidently predicted 
as probably damaging nsSNPs and the remainder 
as benign by PolyPhen (Table I). 

Investigation of functional effect 
and estimated risk of COL1A1 nsSNPs 

In order to identify nsSNPs with a high possi- 
bility of having a functional effect, FASTSNP was 



Table I. I-Mutant, SIFT and PolyPhen results of COL1A1 



SNPids 


Alleles 


Amino 
acid change 


DDG 


Tolerance 
index 


Predicted 
impact 


PolyPhen prediction 


rsl059454 


A/C 


T1431P 


-2.19 


0.00 


Intolerant 


Probably damaging 


rsll35345 


A/G 


E591K 


-0.87 


0.89 


Tolerant 


Benign 


rsll35348 


C/G 


G1019A 


-0.23 


0.00 


Intolerant 


Benign 


rsl800211 


A/G 


R564H 


-0.36 


0.36 


Tolerant 


Benign 


rs8179178 


G/T 


G197C 


0.35 


0.00 


Intolerant 


Probably damaging 


rsl7853657 


G/T 


P1460H 


-1.71 


0.03 


Intolerant 


Probably damaging 


rsl7857117 


C/G 


P1438R 


-1.12 


0.00 


Intolerant 


Probably damaging 


rs41316713 


A/G 


R1141Q 


0.15 


0.04 


Intolerant 


Benign 


rs41316719 


A/G 


V1177I 


-1.21 


0.00 


Intolerant 


Benign 


rs57377812 


C/T 


G476R 


-0.37 


0.01 


Intolerant 


Benign 


rs66548636 


G/T 


G389C 


-0.22 


0.32 


Tolerant 


Benign 


rs66761141 


G/T 


G407C 


0.05 


0.00 


Intolerant 


Benign 


rs66893386 


G/T 


G404C 


-0.18 


0.06 


Tolerant 


Benign 


rs66929517 


G/T 


G815V 


-0.09 


0.09 


Tolerant 


Benign 


rs66948146 


G/T 


G1187V 


0.18 


0.04 


Intolerant 


Benign 


rs67182491 


G/T 


G383C 


0.33 


0.06 


Tolerant 


Benign 


I-S67445413 


G/T 


G866C 


0.00 


0.04 


Intolerant 


Benign 


rs67682641 


G/T 


G530C 


-0.21 


0.00 


Intolerant 


Benign 


rs72656307 


C/T 


R1093C 


-0.35 


0.38 


Tolerant 


Benign 


rs72656312 


G/T 


G1124C 


0.73 


0.06 


Tolerant 


Benign 


rs72656318 


G/T 


G1145C 


-0.12 


0.00 


Intolerant 


Benign 


rs72656321 


G/T 


G1151V 


-0.61 


0.02 


Intolerant 


Benign 


rs72656324 


G/T 


G1166C 


-0.44 


0.00 


Intolerant 


Benign 


rs72656329 


G/T 


G1178V 


-0.51 


0.01 


Intolerant 


Benign 


rs72656331 


G/T 


G1184V 


0.13 


0.00 


Intolerant 


Benign 


rs72656340 


A/G 


M1264V 


-0.70 


0.78 


Tolerant 


Probably damaging 


rs72656343 


G/T 


W1312C 


-0.05 


0.11 


Tolerant 


Benign 


rs72656344 


C/T 


H1323Y 


-2.15 


1.00 


Tolerant 


Probably damaging 


rs72656351 


G/T 


D1441Y 


-0.99 


0.00 


Intolerant 


Probably damaging 


rs72667029 


G/T 


G200V 


-0.29 


0.29 


Tolerant 


Benign 


rs72667031 


G/T 


G203V 


0.37 


0.06 


Tolerant 


Benign 


rs72667037 


G/T 


G221C 


0.55 


0.03 


Intolerant 


Benign 


rs72667038 


G/T 


G224C 


-0.23 


0.06 


Tolerant 


Benign 



SNP IDs in bold are predicted to be highly polymorphic 
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applied for the detection of nsSNP influence on 
cellular and molecular biological function, e.g. tran- 
scriptional and splicing regulation. In addition the 
estimation of risk score was also calculated by 
FASTSNR The functional effect and estimated risk of 
COL1A1 nsSNPs are shown in Table II. Eight COL1A1 
nsSNPs exhibited a medium-high risk score (risk 
score = 3-4). The functional nsSNPs were rsl059454, 
rs8179178, rsl7853657, rsl7857117, rs41316713, 
rs41316719, rs72656312 and rs72656329. The remai- 



ning nsSNPs showed low-medium risk (risk score 
= 2-3). The two functional nsSNPs (rs72656329 and 
rs41316719) detected by FASTS NP were also pre- 
dicted to be polymorphic by l-Mutant 2.0 and SIFT. 
The nsSNPs rs72656312 and rs41316713 were also 
predicted to be deleterious by SIFT. The nsSNP 
rs8179178 was also predicted to be functionally 
damaging by SIFT and PolyPhen software. But the 
most important finding detected by FASTSNP was 
the three nsSNPs, namely rs!059454, rs!7853657 



Table II. Functional effect and estimated risk (FASTSNP) 



SNP ids 


Alleles 


Amino acid change 


Possible effect 


Risk score 


rsl059454 


A/C 


T1431P 


Non-conservative change, splicing regulation 


3-4 


rsll35345 


A/G 


E591K 


Conservative change, splicing regulation 


2-3 


rsll35348 


C/G 


G1019A 


Conservative change, splicing regulation 


2-3 


rsl800211 


A/G 


R564H 


Conservative change, splicing regulation 


2-3 


rs8179178 


G/T 


G197C 


Splicing site 


3-4 


rsl7853657 


G/T 


P1460H 


Non-conservative change, splicing regulation 


3-4 


rsl7857117 


C/G 


P1438R 


Non-conservative change, splicing regulation 


3-4 


rs41316713 


A/G 


R1141Q 


Non-conservative change, splicing regulation 


3-4 


rs41316719 


A/G 


VH77I 


Splicing site 


3-4 


... ~ 1 7T77H1 — * 

rs573778l2 


C/T 


G476R 


Conservative change, splicing regulation 


2-3 


rs66548636 


G/T 


G389C 


Conservative change 


2-3 


rs6676H4l 


G/T 


G407C 


Conservative change 


2-3 


rs66893386 


G/T 


G404C 


Conservative change, splicing regulation 


2-3 


rs669295l7 


G/T 


G815V 


Conservative change, splicing regulation 


2-3 


rs66948l46 


G/T 


GH87V 


Conservative change, splicing regulation 


2-3 


rs67182491 


G/T 


G383C 


Conservative change, splicing regulation 


2-3 


rs67445413 


G/T 


G866C 


Conservative change 


2-3 


rs67682641 


G/T 


G530C 


Conservative change 


2-3 


rs72656307 


C/T 


R1093C 


Conservative change 


2-3 


rs72656312 


G/T 


G1124C 


Splicing site 


3-4 


rs72656318 


G/T 


G1145C 


Conservative change, splicing regulation 


2-3 


rs72656321 


G/T 


G1151V 


Conservative change, splicing regulation 


2-3 


rs72656324 


G/T 


GH66C 


Conservative change, splicing regulation 


2-3 


rs72656329 


G/T 


GH78V 


Splicing site 


3-4 


rs72656331 


G/T 


GH84V 


Conservative change, splicing regulation 


2-3 


rs72656340 


A/G 


M1264V 


Conservative change, splicing regulation 


2-3 


rs72656343 


G/T 


W1312C 


Conservative change, splicing regulation 


2-3 


rs72656344 


C/T 


H1323Y 


Conservative change, splicing regulation 


2-3 


rs72656351 


G/T 


D1441Y 


Conservative change 


2-3 


rs72667029 


G/T 


G200V 


Conservative change, splicing regulation 


2-3 


rs72667031 


G/T 


G203V 


Conservative change, splicing regulation 


2-3 


rs72667037 


G/T 


G221C 


Conservative change, splicing regulation 


2-3 


rs72667038 


G/T 


G224C 


Conservative change, splicing regulation 


2-3 



Note: SNP IDs in bold are predicted to be highly polymorphic 
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and rsl7857117, that were also found polymorphic 
by l-Mutant 2.0, SIFT as well as by PolyPhen. 

Discussion 

Our analysis revealed 247 SNPs as non-synony- 
mous out of which 5 nsSNPs, namely rsl059454, 
rsl7853657, rsl7857117, rs41316719 and rs72656344, 
were found to be least stable by l-Mutant 2.0 
with a DDG value of > -1.0. Four nsSNPs, namely 
rsl7853657, rsl7857117, rs57377812 and rsl059454, 
showed a highly deleterious tolerance index score 
of 0.00 with a change in their physicochemical pro- 
perties by the SIFT server. Seven nsSNPs, namely 
rsl059454, rs8179178, rsl7853657, rsl7857117, 
rs72656340, rs72656344 and rs72656351, were 
found to be probably damaging, with a PSIC score 
difference between 2.0 and 3.5 by the PolyPhen 
server. Three nsSNPs, namely rsl059454, rsl7853657 
and rsl7857117, were found to be highly polymor- 
phic with a risk score of 3-4 with a possible effect 
of non-conservative change and splicing regulation 
by FASTS NR 

A major interest in human genetics is to distin- 
guish mutations that are functionally neutral from 
those that contribute to disease. Amino acid substi- 
tutions currently account for approximately half of 
the known gene lesions responsible for human 
inherited disease. Therefore, the identification of 
nsSNPs that affect protein functions and relate to 
disease is an important task. The effect of many 
nsSNPs will probably be neutral as natural selec- 
tion will have removed mutations at essential posi- 
tions. Assessment of non-neutral SNPs is mainly 
based on phylogenetic information (i.e. correlation 
with residue conservation) extended to a certain 
degree with structural approaches. However, there 
is increasing evidence that many human disease 
genes are the result of exonic or non-coding muta- 
tions affecting regulatory regions [14]. Much atten- 
tion has been focused on modeling by different 
methods the possible phenotypic effect of SNPs 
that cause amino acid changes, and only recently 
has interest focused on functional SNPs affecting 
regulatory regions or the splicing process. More- 
over, because of their widespread distribution on 
the species genome, SNPs are particularly impor- 
tant and valuable as genetic makers in research on 
diseases and the corresponding drugs. To date, 
millions of human SNPs have been reported by 
high-throughput methods. The vast number of SNPs 
causes a challenge for biologists and bioinformati- 
cians although they provide a lot of information 
about the relationships between individuals. Besides 
numerous ongoing efforts to identify millions of 
these SNPs, there is now also a focus on studying 
associations between disease risk and these genetic 
variations using a molecular epidemiological approach. 
This plethora of SNPs points out a major difficulty 
faced by scientists in planning costly population-based 



genotyping, which is to choose target SNPs that are 
most likely to affect phenotypic functions and ulti- 
mately contribute to disease development [14]. 

Currently, most molecular studies focus on SNPs 
located in coding and regulatory regions, yet many 
of these studies have been unable to detect signif- 
icant associations between SNPs and disease sus- 
ceptibility. To develop a coherent approach for prior- 
itizing SNP selection for genotyping in molecular 
studies, an evolutionary perspective to SNP scree- 
ning is applied. The hypothesis is that amino acids 
conserved across species are more likely to be func- 
tionally significant. Therefore, SNPs that change 
these amino acids might be more likely to be asso- 
ciated with disease susceptibility. It is becoming 
clear that application of the molecular evolutionary 
approach may be a powerful tool for prioritizing 
SNPs to be genotyped in future molecular epide- 
miological studies [14]. Therefore, our analysis will 
provide useful information in selecting SNPs of the 
COL1A1 gene that are likely to have a potential func- 
tional impact. 

Although computational tools show their poten- 
tial in reducing the number of nsSNPs for disease 
association studies by filtering nsSNPs that are most 
likely to be disease related, error predictions do 
occur. Various computational tools used in this 
analysis determine the functional effects of SNPs 
only with respect to a single biological function. 
Therefore, much time and effort is required from 
researchers to identify the appropriate tools and 
interpret the predictions. There are also some 
aspects affecting the prediction correctness for 
prediction programs like SIFT and PolyPhen. SIFT 
and PolyPhen depend on diverse databases for SNP 
information. Polluted databases with incorrect SNP 
reports and bias of the data towards disease-asso- 
ciated allelic variants are likely to lead to over- 
prediction of the number of deleterious nsSNPs [24]. 
Furthermore, tools finding SNPs may identify base 
alterations between the functional gene and a pseu- 
dogene and mistakenly report these alterations as 
SNPs in the functional protein. Including nsSNPs 
mistakenly mapped from pseudogenes in the SNP 
database will affect the prediction accuracy of pre- 
dictive tools using SNP information from these 
databases [25]. 

In conclusion, in our analysis, three nsSNPs 
(rsl059454, rsl7853657 and rsl7857117) were found 
to be less stable, deleterious, probably damaging 
and to have a high risk score by l-Mutant 2.0, SIFT, 
PolyPhen and FASTSNP, respectively. We therefore 
conclude that these three nsSNPs are potentially 
functionally polymorphic. To those conducting large- 
scale population-based epidemiological studies, the 
idea of prioritizing nsSNPs in the investigation of 
association of SNPs with disease risk is of great 
interest. The use of these servers to select poten- 
tially polymorphic nsSNPs for epidemiological stu- 
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dies can be an efficient way to explore the role of 
genetic variation in disease risk and to curtail cost. 
Furthermore, the predicted impact of these nsSNPs 
can be tested using animal models or cell lines to 
determine whether functionality of the protein has 
indeed been altered. 
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